Spoken language translation using automatically transcribed text in training

نویسندگان

Stephan Peitz

Simon Wiesler

Markus Nußbaum-Thom

Hermann Ney

چکیده

In spoken language translation a machine translation system takes speech as input and translates it into another language. A standard machine translation system is trained on written language data and expects written language as input. In this paper we propose an approach to close the gap between the output of automatic speech recognition and the input of machine translation by training the translation system on automatically transcribed speech. In our experiments we show improvements of up to 0.9 BLEU points on the IWSLT 2012 English-to-French speech translation task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Modeling Approach for Retrieving Passages in Lecture Audio Data

Spoken Document Retrieval (SDR) is a promising technology for enhancing the utility of spoken materials. After the spoken documents have been transcribed by using a Large Vocabulary Continuous Speech Recognition (LVCSR) decoder, a text-based ad hoc retrieval method can be applied directly to the transcribed documents. However, recognition errors will significantly degrade the retrieval performa...

متن کامل

Spoken document retrieval by translating recognition candidates into correct transcriptions

This paper proposes an ad hoc retrieval method for spoken documents that uses a statistical translation technique. After transcribing the spoken documents by using a Large-Vocabulary Continuous Speech Recognition (LVCSR) decoder, a text-based ad hoc retrieval method can be directly applied to the transcribed documents. However, recognition errors will signi cantly degrade the retrieval performa...

متن کامل

Automatic Conversion of Dialectal Tamil Text to Standard Written Tamil Text using FSTs

We present an efficient method to automatically transform spoken language text to standard written language text for various dialects of Tamil. Our work is novel in that it explicitly addresses the problem and need for processing dialectal and spoken language Tamil. Written language equivalents for dialectal and spoken language forms are obtained using Finite State Transducers (FSTs) where spok...

متن کامل

Automatic extraction of bilingual chunk lexicon for spoken language translation

In language communication, an utterance may be segmented as a concatenation of chunks that are reasonable in syntax, meaningful in semantics, and composed of several words. Usually, the order of words within chunks is fixed, and the order of chunks within an utterance is rather flexible. The improvement of spoken language translation could benefit from using bilingual chunks. This paper present...

متن کامل

Inferring linguistic structure in spoken language

We demonstrate the applications of Markov Chains and HMMs to modeling of the underlying structure in spontaneous spoken language. Experiments with supervised training cover the detection of the current dialog state and identi cation of the speech act as used by the speech translation component in our JANUS Speech-to-Speech Translation System. HMM training with hidden states is used to uncover o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Spoken language translation using automatically transcribed text in training

نویسندگان

چکیده

منابع مشابه

Language Modeling Approach for Retrieving Passages in Lecture Audio Data

Spoken document retrieval by translating recognition candidates into correct transcriptions

Automatic Conversion of Dialectal Tamil Text to Standard Written Tamil Text using FSTs

Automatic extraction of bilingual chunk lexicon for spoken language translation

Inferring linguistic structure in spoken language

عنوان ژورنال:

اشتراک گذاری